Word-Formation Network for Czech

نویسندگان

Magda Sevcíková

Zdenek Zabokrtský

چکیده

In the present paper, we describe the development of the lexical network DeriNet, which captures core word-formation relations on the set of around 266 thousand Czech lexemes. The network is currently limited to derivational relations because derivation is the most frequent and most productive word-formation process in Czech. This limitation is reflected in the architecture of the network: each lexeme is allowed to be linked up with just a single base word; composition as well as combined processes (composition with derivation) are thus not included. After a brief summarization of theoretical descriptions of Czech derivation and the state of the art of NLP approaches to Czech derivation, we discuss the linguistic background of the network and introduce the formal structure of the network and the semi-automatic annotation procedure. The network was initialized with a set of lexemes whose existence was supported by corpus evidence. Derivational links were created using three sources of information: links delivered by a tool for morphological analysis, links based on an automatically discovered set of derivation rules, and on a grammar-based set of rules. Finally, we propose some research topics which could profit from the existence of such lexical network.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

COMPUTATIONAL LEXICOGRAPHY AND LEXICOLOGY Computational Processing of Czech Derived Words

Abstract The system presented in this paper is concerned with the computational processing of the selected types of Czech word-formation. The developed programming tool (word-formation module) aims at analysing and synthesising Czech derived words. Such a system is of particular value for automatic processing of Czech language where derivational morphology plays an important role in regular wor...

متن کامل

Enriching WordNet with Derivational Subnets

In this paper, we deal with the derivational (word formation) relations as they are handled by the Czech morphological module Ajka. First, we show that they represent empirically well-based semantic relations forming small semantic networks, and then we solve the problem how to integrate them into lexical database such as (Czech) WordNet. In this respect we examine the relation between the deri...

متن کامل

Neural Network Based Named Entity Recognition

Czech named entity recognition (the task of automatic identification and classification of proper names in text, such as names of people, locations and organizations) has become a well-established field since the publication of the Czech Named Entity Corpus (CNEC). This doctoral thesis presents the author’s research of named entity recognition, mainly in the Czech language. It presents work and...

متن کامل

The Core of the Czech Derivational Dictionary

Amongst all available language resources for the Czech language one can find a lot of useful dictionaries, databases and corpora. There are machine readable dictionaries of literary Czech (Havránek, 1989; Filipec, 1998), the dictionary of Czech synonyms (Pala, 2000) and two encyclopaedia: Otto and Diderot. Moreover, Czech researchers have two morphological databases (Hajič, 2001; Sedláček and S...

متن کامل